Develop#172
Merged
Merged
Conversation
… and git-workflow command RCA: Users and AI agents lacked structured onboarding, troubleshooting, and tool selection guidance when connecting a Provar project via ProvarDX MCP, causing long friction loops and repeated tool selection mistakes that baked-in agent workflows would prevent. Fix: Added three MCP prompts (provar.guide.onboarding, troubleshoot, orchestration), one resource (provar://docs/tool-guide), and a tracked /git-workflow slash command covering Jira ticket creation, branch naming, worktree setup with yarn install, and the full dev/PR lifecycle. Smoke test updated to 54 entries. Gitignore tightened to track .claude/commands/.
RCA: CI workflow pinned QualityOrchestrator at v1.0.0, requiring manual edits to pick up every subsequent patch or minor release, causing the action to drift behind the latest available version. Fix: Created floating v1 tag on mrdailey99/QualityOrchestrator (currently at v1.0.2) and updated CI_Execution.yml to reference @v1, so the workflow automatically uses the latest v1.x release without any further changes needed.
RCA: Copilot flagged 14 issues — wrong tool schemas in guide prompts and docs, missing build copy step, hardcoded cloudId in a public repo file, broken gitignored file reference. Fix: Add PROVAR_TOOL_GUIDE.md to package.json build copy; fix all wrong tool params in guide docs and prompts (properties_generate output_path, --plan-name, testrun_rca project_path, testcase_step_edit test_case_path, testplan add-instance hyphen, defect run_id); remove hardcoded cloudId; remove broken agents ref. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…r-version PDX-0: chore(ci): update QualityOrchestrator to floating v1 tag
…ides PDX-0: feat(mcp): add user-facing guide prompts, tool-guide resource, and git-workflow command
… repo at release time RCA: NitroX component packages were statically bundled in the repo and not updated automatically; the source of truth is the ProvarTesting/factPackages GitHub repo (main branch), so packages would silently drift stale between releases. Fix: Added scripts/fetch-nitrox-packages.cjs to the prepack hook; it downloads all component package files from factPackages@main, regenerates NITROX_COMPONENT_CATALOG.md, and writes NITROX_CATALOG_SOURCE.json with the commit SHA. On failure (no token, network error) it logs a warning and falls back to the committed catalog — the release is never blocked. A new provar://nitrox/catalog-source MCP resource exposes the bundled version so consumers can verify which factPackages commit is in use.
…ings RCA: The factPackages repo stores component files under fact-*/src/components/ not fact-*/components/, so the path-matching regexes and catalog builder needed updating; additionally nine pre-existing unicorn/numeric-separators-style lint warnings in updateChecker.ts and its test file were left unaddressed. Fix: Updated PKG_JSON_RE and COMPONENT_FILE_RE in fetch-nitrox-packages.cjs to match the fact-*/src/ layout and adjusted buildCatalogFromDir to navigate the src/ subdirectory; ran eslint --fix on updateChecker.ts and updateChecker.test.ts to resolve all numeric-separator warnings, leaving the project at 0 lint errors and 0 warnings.
RCA: downloadRaw() used the branch name (main) in the raw URL rather than the resolved commit SHA, so files could be fetched from a different commit than the one the tree listing described; additionally both httpsGet and httpsGetBuffer had no timeout, meaning a stalled network connection would block prepack indefinitely. Fix: Added REQUEST_TIMEOUT_MS (15s) to both http helpers via req.setTimeout/req.destroy so hangs fail fast and fall through to the graceful fallback; updated downloadRaw to accept and use the commitSha parameter so all downloads are pinned to the same commit as the tree.
…ponent-package-retrieval PDX-463: feat(mcp): fetch NitroX component packages from factPackages repo at release time
…er startup (#158) RCA: No bin entry in package.json forced users through a two-step sf CLI plugin install before connecting Claude Desktop, creating unnecessary onboarding friction. Fix: Added provardx bin entry pointing to bin/mcp-start.js; lightweight ESM entrypoint parses mcp start flags, validates --allowed-paths as required, then delegates to the same server bootstrap used by the sf plugin path.
* PDX-464: feat(mcp): fetch NitroX schemas from internal source at build time RCA: FactComponent.schema and FactPackage.schema were bundled statically and never refreshed from the canonical internal source, risking stale schema validation in released packages. Fix: Extended fetch-nitrox-packages.cjs to download both schemas from the same commit SHA as the component catalog, write to src/mcp/rules/ and root-level copies, and record schemasUpdated in NITROX_CATALOG_SOURCE.json. Falls back to bundled schemas with a warning on any failure. * PDX-464: fix(mcp): address Copilot review — schema consumers, repo field, schemasUpdated normalisation RCA: Four review issues: docs incorrectly named runtime tools as schema consumers; repo field exposed internal URL in MCP resource; readCatalogSource did not normalise missing schemasUpdated from older build artifacts; fallback object also contained the internal URL. Fix: Corrected docs to describe IDE/SchemaStore as schema consumers; removed repo field from emitted JSON and fallback; normalised schemasUpdated to null in readCatalogSource try-path when field is absent; updated tests to cover the new normalisation and assert no repo field in fallback.
…alidate (#159) * PDX-466: feat(mcp): add AJV JSON schema validation alongside hardcoded NX rules RCA: provar_nitrox_validate only ran hardcoded NX001–NX010 semantic rules; structural errors (wrong types, extra properties, enum violations) encoded in FactComponent.schema.json were never caught at validation time. Fix: Added Ajv2020 as a runtime dependency; schema is lazily loaded from lib/mcp/rules/FactComponent.schema.json on first call and validated in parallel with existing rules. Violations are returned as NX_SCHEMA_<KEYWORD> issues (ERROR for type/required, WARNING for additionalProperties/pattern/enum). Falls back to hardcoded-rules-only when schema is unavailable. * PDX-466: fix(test): replace no-explicit-any with typed ValidateFunction import in NX_SCHEMA tests RCA: ESLint no-explicit-any rule rejected the any parameter type used for the schemaOverride parameter in AJV schema override tests; the eslint-disable comment was positioned on the wrong line. Fix: Added import type { ValidateFunction } from ajv/dist/2020.js and replaced all any usages with properly typed ValidateFunction and a narrow IssueShape type alias for the return value. * PDX-466: fix(mcp): address Copilot review comments on nitrox-ajv-schema-validation RCA: Copilot flagged incorrect 'in parallel' wording (validation is synchronous/sequential), an overly broad ERROR severity mapping in ajvErrorToIssue (MIN_ITEMS/MINIMUM/MAXIMUM should be WARNING), and broken markdown rendering of NX_SCHEMA_* in docs (underscores parsed as italic markers). Fix: Reworded tool description and docs to 'sequential' passes; narrowed ERROR set to REQUIRED and TYPE only; fixed NX_SCHEMA_* heading and inline text with backtick quoting; updated docs table to show MIN_ITEMS as WARNING.
…nstall tag (#160) * PDX-467: chore(docs): bump version to 1.5.0 stable and remove @beta install tag RCA: The 1.5.0 release is ready for stable promotion; all docs and install commands still referenced the @beta dist-tag and pre-release version string. Fix: Updated package.json and server.json to 1.5.0, removed @beta from all install commands in README.md, docs/mcp.md, and docs/mcp-pilot-guide.md, and updated the stale-cache unit test to use the latest channel. * PDX-467: test(mcp): make stale-cache test release-agnostic by deriving channel at runtime RCA: Hardcoded version and channel values in the stale-cache test will fail once the branch version cycles back to a prerelease (beta/rc) after the 1.5.0 stable release. Fix: Derive currentVersion and channel from the running version at test time, mirroring the pattern used in the fresh-cache test, so the test remains valid across any semver channel.
RCA: Four issues identified in post-merge Copilot review: AJV error mapping for additionalProperties/required used instancePath (parent object) instead of err.params, NITROX_CATALOG_SOURCE.json had an exposed repo field and missing schemasUpdated key, the provardx-cli bin alias was absent making npx resolution ambiguous, and prompt counts in README/docs were stale at 7 (now 11). Fix: ajvErrorToIssue now special-cases additionalProperties and required keywords to pull field/applies_to from err.params; committed JSON normalised to stable shape; package.json adds provardx-cli bin alias; README.md and docs/mcp.md updated to 11 MCP prompts.
…ies and required errors RCA: Copilot review noted that the new ajvErrorToIssue params-based mapping for additionalProperties and required keywords had no test coverage for the field and applies_to values, leaving the mapping free to silently regress. Fix: Expanded NX_SCHEMA_ADDITIONAL_PROPERTIES test to assert field and applies_to equal the extra property name; added NX_SCHEMA_REQUIRED case asserting field and applies_to equal the missing property name; added applies_to/field assertions to the NX_SCHEMA_TYPE test.
…ropagation before MCP publish RCA: MCP registry publish failed with 404 because npm package was not yet indexed when mcp-publisher ran immediately after npm publish. Fix: Skip npm publish if version already exists on npm, and poll npm registry for up to 3 minutes before running mcp-publisher publish. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…spatch Slack notification RCA: On workflow_dispatch, RELEASE_BODY is empty and the git log range resolved to zero commits because git describe returned the current tag when HEAD was exactly at the tag, so the Slack message always showed "No notable changes extracted." Fix: For workflow_dispatch, fetch the release body via gh api for the version tag before falling back to git log extraction. Also fix the git log fallback to use HEAD^ so it finds the previous tag correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tartup tuning (PDX-469) RCA: Agents with limited context windows hit budget limits on startup because full tool descriptions and schema metadata consume hundreds of tokens per tool. Customers also reported wasted context when only a subset of ProvarDX tools were relevant to their workflow (e.g. NitroX-only or automation-only sessions). Fix: Add PROVAR_MCP_SCHEMA_MODE=compact that switches all 19 tool descriptions to short summaries via desc() helper; add PROVAR_MCP_TOOLS env var that restricts which tool groups register at startup via TOOL_GROUPS + parseActiveGroups(). Covers nitrox, automation, qualityhub, validation, authoring, inspect, connection, rca groups. Add 13 unit tests and Agent performance tuning section in docs/mcp.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…--profile to smoke script RCA: parseActiveGroups returned an empty Set for inputs like PROVAR_MCP_TOOLS="," which caused no tool groups to register (silent outage). provardx_ping message param was not routed through desc(), making the docs/mcp.md claim "every parameter is replaced" inaccurate. quality_threshold compact desc said "int" but the Zod schema uses z.number(). Smoke script TOTAL_EXPECTED was hardcoded so --profile had no way to adjust the expected count for partial runs. Fix: parseActiveGroups now checks groups.size===0 post-filter and returns null (all groups) with a warn log. provardx_ping message routes through desc(). projectValidateFromPath quality_threshold compact desc changed from "int" to "number". docs/mcp.md line 484 softened from "every parameter" to "most". mcp-smoke.cjs gains --profile flag, inGroup() helper, PROVAR_MCP_TOOLS passthrough to server env, dynamic expectedCount, and group-conditional callTool wrappers. Adds desc() unit tests and "," / ",," edge cases to startupTuning.test.ts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s for PR #170 RCA: CLAUDE.md requires docs updates for env var and tool description changes; PR #170 added PROVAR_MCP_SCHEMA_MODE and PROVAR_MCP_TOOLS support with an overclaiming compact description sentence Fix: Changed tool descriptions and parameter descriptions to most tool and parameter descriptions to accurately reflect that provardx_ping message param is not routed through desc() helper Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
RCA: Version bump required for the 1.5.1 release cycle covering PDX-468 through PDX-475. Fix: Update package.json and server.json to 1.5.1 in sync per CLAUDE.md convention. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…p-tuning PDX-468/469: feat(mcp): PROVAR_MCP_SCHEMA_MODE compact descriptions + PROVAR_MCP_TOOLS group filtering
…re to validation tools RCA: Iterative fix-validate loops re-emit full violation inventories on every call, compounding token cost with no stop signal; agents have no way to know when to stop iterating or which violations changed since the prior run. Fix: Add detail=summary|standard|full, baseline_run_id diff mode (returns only added/resolved violations), and completeness_score/recommended_next_action to all four validation tools. New utilities: detailLevel.ts, validationScore.ts, validationDiff.ts. 79 unit tests across validationScore, validationDiff, testSuiteValidate, testPlanValidate. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
RCA: Multiple correctness and code quality issues identified in review
of the detail/diff/completeness PDX-470/471/473 implementation.
Fix: - generateRunId: add random suffix to prevent sub-millisecond collisions
- testSuiteValidate: collect full violation hierarchy (recursive helper)
- All three validate tools: load baseline before saveRun to prevent
eviction race; call hasAnyRun before saveRun for first-run heuristic
- testCaseValidate: include best_practices_violations in diff snapshot;
extract resolveBaseResult helper to reduce handler complexity to 17
- projectValidateFromPath: omit run_id when save_results=false;
extract classifyError helper to reduce handler complexity to 18
- Remove dead code: delete unused descHelper.ts
- Tests: update run_id regex, add 8 TC tests, add detailLevel.test.ts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
RCA: Five correctness issues remained after the Copilot follow-up commit: Bug 5 — tcStorageDir/suiteStorageDir both wrote to same path allowing cross-tool baseline collisions; Bug 7 — computeDiff Map collapsed duplicate violations; Bug 8 — 120-char truncation caused false-equal keys; Bug 9 — calcNextAction returned stop even when quality/BP violations remained; Missing AC — include_plan_details and max_* params not marked @deprecated. Fix: - Namespace storage dirs (testcase/, testsuite/) to prevent cross-tool baseline collisions — computeDiff now uses multiset (counts per key) so duplicate violations are distinct events — remove 120-char message truncation — calcNextAction gains remainingViolationCount param (default 0); stop only fires when score=100 AND count=0 — all three tools pass currentViolations.length — projectValidateFromPath marks include_plan_details/max_uncovered/max_violations @deprecated — 2 multiset tests, 2 secondary-check tests, updated TC test, 7 projectValidate tests for run_id, detail=summary, BASELINE_NOT_FOUND, diff round-trip Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eness_score for PR #168 RCA: CLAUDE.md requires docs updates for every PR that adds or modifies tool parameters; PR #168 added detail, baseline_run_id, run_id, completeness_score, and recommended_next_action to 4 validation tools without updating docs/mcp.md Fix: Updated provar_testcase_validate, provar_testsuite_validate, provar_testplan_validate, and provar_project_validate docs with new input params and output fields; added BASELINE_NOT_FOUND error code; marked include_plan_details/max_uncovered/max_violations as deprecated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Req: Agent tools must be preventable from running unchecked agentic loops that exhaust context without returning to the user. Observability tooling also needs per-call token cost signals to track LLM usage across sessions. Fix: PROVAR_MCP_MAX_TOOL_DEPTH caps tool calls per MCP session (default 50) with TOOL_BUDGET_EXCEEDED errors; PROVAR_MCP_EMIT_TOKEN_META appends a _meta token-attribution block to structuredContent (PDX-475). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR expands the MCP server with response-shaping, validation run diffing, token/depth metadata, NitroX catalog metadata, zero-install startup support, and updated release/docs workflows.
Changes:
- Adds MCP utilities for detail levels, field masks, validation scoring/diff persistence, token metadata, and compact descriptions.
- Extends validation/inspection/Quality Hub tools with detail/fields/run metadata and adds tool grouping/depth-guard server behavior.
- Updates packaging, release automation, zero-install docs/bin entrypoint, NitroX catalog fetching, tool-guide documentation, and broad unit coverage.
Reviewed changes
Copilot reviewed 44 out of 44 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| test/unit/mcp/validationScore.test.ts | Adds unit coverage for validation scoring and next-action decisions. |
| test/unit/mcp/validationDiff.test.ts | Adds unit coverage for run IDs, persisted baselines, and violation diffs. |
| test/unit/mcp/updateChecker.test.ts | Adjusts update-cache tests for stable release/channel handling. |
| test/unit/mcp/tokenMeta.test.ts | Adds tests for depth guard and token metadata attachment. |
| test/unit/mcp/testSuiteValidate.test.ts | Adds tests for suite detail levels, run IDs, diffs, and next actions. |
| test/unit/mcp/testPlanValidate.test.ts | Adds tests for plan detail levels and completeness/next-action fields. |
| test/unit/mcp/testCaseValidate.test.ts | Adds tests for testcase detail levels, run IDs, diffs, and next actions. |
| test/unit/mcp/startupTuning.test.ts | Adds tests for compact descriptions and tool group parsing/registration. |
| test/unit/mcp/server.test.ts | Adds tests for NitroX catalog source metadata resource helper. |
| test/unit/mcp/qualityHubTools.test.ts | Adds tests for detail and field filtering on selected Quality Hub tools. |
| test/unit/mcp/projectValidateFromPath.test.ts | Adds tests for project validation detail, run IDs, and diff responses. |
| test/unit/mcp/projectInspect.test.ts | Adds tests for project inspect detail/field filtering and path errors. |
| test/unit/mcp/nitroXTools.test.ts | Adds AJV schema-rule tests for NitroX validation. |
| test/unit/mcp/fieldMask.test.ts | Adds unit coverage for sparse field masking. |
| test/unit/mcp/detailLevel.test.ts | Adds unit coverage for response detail shaping. |
| test/unit/mcp/connectionTools.test.ts | Adds field-mask tests for connection listing. |
| test/unit/bin/mcpStart.test.ts | Adds argument-validation tests for the new zero-install bin entrypoint. |
| src/mcp/utils/validationScore.ts | Adds completeness scoring and next-action helpers. |
| src/mcp/utils/validationDiff.ts | Adds run persistence and violation diff helpers. |
| src/mcp/utils/tokenMeta.ts | Adds depth guard, token estimation, and optional metadata attachment. |
| src/mcp/utils/fieldMask.ts | Adds field-list parsing and sparse response masking. |
| src/mcp/utils/detailLevel.ts | Adds summary/standard/full response shaping helper. |
| src/mcp/update/updateChecker.ts | Normalizes numeric literals in update checker constants. |
| src/mcp/tools/testSuiteValidate.ts | Adds suite detail levels, run persistence, diffs, and next-action fields. |
| src/mcp/tools/testPlanValidate.ts | Adds plan detail levels and completeness/next-action fields. |
| src/mcp/tools/testPlanTools.ts | Routes tool descriptions through compact-description helper. |
| src/mcp/tools/testCaseValidate.ts | Adds testcase detail levels, run persistence, diffs, and next-action fields. |
| src/mcp/tools/testCaseStepTools.ts | Routes step-edit descriptions through compact-description helper. |
| src/mcp/tools/testCaseGenerate.ts | Routes generation descriptions through compact-description helper. |
| src/mcp/tools/rcaTools.ts | Routes RCA tool descriptions through compact-description helper. |
| src/mcp/tools/qualityHubTools.ts | Adds detail/fields filtering to selected Quality Hub responses and compact descriptions. |
| src/mcp/tools/qualityHubApiTools.ts | Routes Quality Hub API tool descriptions through compact-description helper. |
| src/mcp/tools/propertiesTools.ts | Routes properties tool descriptions through compact-description helper. |
| src/mcp/tools/projectValidateFromPath.ts | Adds project detail levels, run persistence, diffs, and next-action fields. |
| src/mcp/tools/projectInspect.ts | Adds detail and field filtering to project inspection. |
| src/mcp/tools/pageObjectValidate.ts | Routes page-object validation descriptions through compact-description helper. |
| src/mcp/tools/pageObjectGenerate.ts | Routes page-object generation descriptions through compact-description helper. |
| src/mcp/tools/descHelper.ts | Adds environment-controlled compact description helper. |
| src/mcp/tools/defectTools.ts | Routes defect tool descriptions through compact-description helper. |
| src/mcp/tools/connectionTools.ts | Adds field filtering to connection listing and compact descriptions. |
| src/mcp/tools/automationTools.ts | Routes automation tool descriptions through compact-description helper. |
| src/mcp/tools/antTools.ts | Routes ANT tool descriptions through compact-description helper. |
| src/mcp/server.ts | Adds tool groups, depth-guard middleware, catalog/tool-guide resources, and catalog source helper. |
| src/mcp/prompts/index.ts | Registers additional guide prompts. |
| server.json | Updates MCP registry package version to 1.5.1. |
| scripts/fetch-nitrox-packages.cjs | Adds release helper to fetch NitroX catalog/schemas and source metadata. |
| README.md | Updates install/setup docs for stable and zero-install usage. |
| package.json | Updates version, AJV dependency, bin entries, prepack, and compile copy steps. |
| docs/PROVAR_TOOL_GUIDE.md | Adds user-facing MCP tool-selection guide. |
| docs/NITROX_CATALOG_SOURCE.json | Adds fallback NitroX catalog source metadata. |
| docs/mcp-pilot-guide.md | Updates install instructions from beta to stable package. |
| bin/mcp-start.js | Adds zero-install MCP startup entrypoint. |
| .gitignore | Tracks Claude commands while keeping local Claude work dirs ignored. |
| .github/workflows/DeployManual.yml | Updates publish workflow for idempotent npm publish and registry notes. |
| .github/workflows/CI_Execution.yml | Updates QualityOrchestrator action reference. |
| .claude/commands/git-workflow.md | Adds local Claude Code git workflow command documentation. |
Comments suppressed due to low confidence (3)
src/mcp/tools/nitroXTools.ts:14
ajv/dist/2020exposes the 2020 validator as the default export, not a namedAjv2020export. With the added AJV dependency this import will fail TypeScript compilation/runtime module loading; import the default class (while keeping the type-only exports) before constructing the validator.
import { Ajv2020, type ValidateFunction, type ErrorObject } from 'ajv/dist/2020.js';
package.json:155
- The compile task now copies
docs/PROVAR_TOOL_GUIDE.mdintolib/mcp/docs, but that source file is not listed in thewireit.compile.filesinputs. Wireit can therefore reuse a cached compile after the guide changes and ship a stale or missinglib/mcp/docs/PROVAR_TOOL_GUIDE.md; add the guide to the tracked files list alongside the other copied docs.
"command": "tsc -p . --pretty --incremental && shx mkdir -p lib/mcp/rules && shx cp src/mcp/rules/*.json lib/mcp/rules/ && shx mkdir -p lib/mcp/docs && shx cp docs/PROVAR_TEST_STEP_REFERENCE.md lib/mcp/docs/ && shx cp docs/NITROX_COMPONENT_CATALOG.md lib/mcp/docs/ && shx cp docs/NITROX_CATALOG_SOURCE.json lib/mcp/docs/ && shx cp docs/PROVAR_TOOL_GUIDE.md lib/mcp/docs/",
.github/workflows/DeployManual.yml:46
npm publishruns the newprepackscript, and that script requiresGITHUB_TOKENorGH_TOKENto fetch the privatefactPackagescatalog/schemas. This publish step only exposesNODE_AUTH_TOKEN, so release builds will fall back to the placeholder catalog source and will not bundle refreshed NitroX schemas unless a GitHub token is passed into this environment.
npm publish --tag "$TAG" --access public
fi
env:
TAG: ${{ github.event.inputs.tag || 'latest' }}
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+148
to
+149
| const completeness_score = calcCompletenessScore(summary.test_cases_valid, summary.total_test_cases); | ||
| const recommended_next_action = calcNextAction(completeness_score, false); |
| function collectAllViolations(result: SuiteResult): DiffableViolation[] { | ||
| const all: DiffableViolation[] = [...(result.violations as unknown as DiffableViolation[])]; | ||
| for (const tc of result.test_cases) { | ||
| all.push(...(tc.issues as unknown as DiffableViolation[])); |
|
|
||
| const shaped = shapeResponse(result, include_plan_details, max_uncovered, max_violations); | ||
| const response = { requestId, ...shaped }; | ||
| const currentViolations = result.project_violations as unknown as DiffableViolation[]; |
Comment on lines
+291
to
+298
| // Load baseline BEFORE saving to prevent eviction of the requested baseline | ||
| const baseline = | ||
| save_results !== false && baseline_run_id !== undefined && baseline_run_id !== '' | ||
| ? loadBaselineViolations(storageDir, baseline_run_id) | ||
| : null; | ||
|
|
||
| const hasBaseline = save_results !== false ? hasAnyRun(storageDir) : false; | ||
|
|
Comment on lines
+460
to
+464
| it('diff mode returns added/resolved/unchanged_count when baseline exists', () => { | ||
| // First call to establish baseline | ||
| const first = server.call('provar_testsuite_validate', { | ||
| suite_name: 'BaselineSuite', | ||
| test_cases: [TC_LOGIN], |
Comment on lines
+146
to
+147
| for (const [group, registrars] of Object.entries(TOOL_GROUPS)) { | ||
| if (activeGroups === null || activeGroups.has(group)) { |
Comment on lines
+347
to
+348
| const usePlanDetails = include_plan_details || detail === 'full'; | ||
| const shaped = shapeResponse(result, usePlanDetails, max_uncovered, max_violations); |
…uction RCA: PR #153 (PDX-479 regression) shipped two artifacts — the provar.guide.orchestration prompt's author-test flow and the PROVAR_TOOL_GUIDE.md "I want to write a new test" section — that both steered LLMs toward generate-empty-then-step_edit-per-step authoring. PDX-480 confirmed locally that disabling these restores correct generation. This patch keeps the helpful guidance but rewrites it to recommend a single provar_testcase_generate call carrying the full step tree; step_edit is now explicitly marked amend-only. Fix: Rewrote the author-test flow in src/mcp/prompts/guidePrompts.ts and the matching section in docs/PROVAR_TOOL_GUIDE.md to mandate single-call construction. Split the orchestration prerequisite graph so provar_testcase_generate and provar_testcase_step_edit appear as distinct entry points with construct-vs-amend annotations. Added construct-vs-amend callouts to docs/mcp.md tool sections. Added pilot-guide Scenario 12 covering the multi-scenario single-call expectation. Added unit tests asserting the canonical phrasing (and absence of "repeat per step") plus a multi-scenario snapshot test that drives 12 steps through provar_testcase_generate in one call and asserts consecutive testItemIds, preserved scenario markers, consistent assert API IDs, and correct UiWithScreen nesting. Added scripts/pdx-481-validate.cjs for direct JSON-RPC verification of the fix at the protocol surface.
PDX-478: fix(mcp): parse .testproject connections and resolve env via associations
RCA: Copilot review flagged two real defects in the PDX-481 regression-guard test and validation harness: (1) the "repeat as needed" assertion in guidePrompts.test.ts had an OR-clause that short-circuited to true because "amend" appears repeatedly elsewhere in the flow, making the assertion a no-op; (2) the prerequisite-graph regex used unbounded [^\n]* so it could match unrelated tokens between the two words on the same line; and one style nit on validate.cjs using a ternary as a statement. Fix: Made the "repeat as needed" assertion unconditional so it actually protects against the anti-pattern phrasing being reintroduced. Tightened the prerequisite-graph regex to require the exact annotation punctuation (provar_testcase_generate\s*\(construct / provar_testcase_step_edit\s*\(amend) so it cannot pass on unrelated text. Replaced the ternary-as-statement counter in pdx-481-validate.cjs with an if/else block. Also added scripts/pdx-481-trace.cjs (the JSON-RPC trace harness used to capture the patched-vs-unpatched prompt-flow side-by-side that was posted to PDX-479 as concrete regression evidence). All gate checks pass: 1118 mocha tests, lint clean, validation 9/9.
…-test-guidance PDX-481: fix(prompts): rewrite author-test flow to single-call construction
… contract RCA: PDX-479 surfaced that authoring guidance lives in three places (prompts, resource, tool descriptions) and a regression in any one of them — like the multi-call author-test flow that shipped in PR #153 — can drift the LLM away from correct test case construction. PDX-481 fixed the prompts + resource but the tool descriptions themselves still carried no construct-vs-amend contract. The steps[] field description was just "Ordered list of test steps" with no anti-pattern protection. If a future upstream prompt re-introduces the multi-call pattern, only the tool description can push back at the call site — and it currently says nothing about it. Fix: Added a three-line construction contract to the top of testCaseGenerate.ts TOOL_DESCRIPTION (single-call pattern, step_edit is for AMENDING, stop-and-assemble guidance for the common mistake). Tightened the steps[] field description to call out the FULL/COMPLETE step tree in one call and warn against the multi-call append pattern. Mirrored the contract in testCaseStepTools.ts: the step_edit description now self-identifies as AMENDMENT-ONLY, rejects construction usage, points agents at provar_testcase_generate for new test cases, and spells out the structural defects (dropped scenarios, flat asserts, inconsistent step types) from misuse. Added 6 regression-guard unit tests asserting the canonical phrasing in both tool descriptions. Added scripts/pdx-482-validate.cjs (13 protocol-surface assertions, 13/13 PASS) for direct JSON-RPC verification. Full gate green: 1127 mocha tests, lint clean, compile clean.
RCA: An adversarial review of the original PDX-482 commit identified three critical defects in the construct/amend contract: (1) PROVAR_MCP_SCHEMA_MODE=compact silently swapped the description for a contract-free one-liner — the LLM would never see the contract in compact mode, making it a regression highway; (2) the regex /step_edit[^.]*not for CONSTRUCTING|CONSTRUCTING[^.]*not/i had a false-positive that could pass on hostile rewordings like "constructing...not via generate"; (3) no test asserted the contract appears EARLY in the description, so a future refactor could move it down where LLM attention is lower. Additionally the step_edit test used an OR-clause across the three structural defects so dropping two of them would silently dilute the warning.
Fix: (1) Compact form on provar_testcase_generate now reads "Generate a Provar test case in ONE call with the FULL steps[] tree. Do NOT call with steps=[] then append via provar_testcase_step_edit (step_edit is for AMENDING existing test cases, not for CONSTRUCTING new ones)." — protocol-surface validator now spawns the server twice (standard + PROVAR_MCP_SCHEMA_MODE=compact) and runs 6 contract assertions against the compact form. (2) Replaced the false-positive regex with literal includes('not for CONSTRUCTING one from scratch') in both the unit test and the validator — locked on the canonical phrasing. (3) Added a leading-position assertion in both the unit test and validator: indexOf('Construction pattern') < 200 to prevent silent drift. (4) Tightened the step_edit "structural defects" test from OR-clause to three separate AND-style assertions on "dropped scenarios", "flat asserts", and "inconsistent step types" — dropping any one now fails the test. Gate: 1129 mocha tests, lint clean, validator 20/20 (was 13/13) covering both schema modes.
RCA: The published Provar-vs-Playwright comparison deck references this script in its methodology appendix as the reproduction recipe for the catalog-token figures. The script lived only in the PDX-482 worktree, so external readers (analysts, customers, prospects) could not actually reproduce the numbers — undermining the methodology slide's credibility. Fix: Promote scripts/token-measure-vs-playwright.cjs to develop as an independent chore. Script spawns both MCP servers (Provar local via bin/mcp-start.js, Playwright via npx -y @playwright/mcp), sends identical initialize → tools/list JSON-RPC pairs, and reports catalog size in characters and approximate tokens (chars/4). Provar MCP runs three configurations (STANDARD / COMPACT / AUTHORING) to demonstrate the PROVAR_MCP_SCHEMA_MODE + PROVAR_MCP_TOOLS levers. Also issues a representative browser_navigate + browser_snapshot against example.com to capture Playwright's per-interaction baseline. No source-tree changes; no test or behaviour impact.
PDX-0: chore(scripts): add token-measure-vs-playwright.cjs
…scriptions PDX-482: feat(mcp): harden testcase tool descriptions for single-call contract
…rate RCA: PDX-482 hardened the testcase_generate tool description, but the contract is passive — an LLM that ignores it pays no price. Calling generate with steps:[]+dry_run:false+output_path still wrote a TODO-only skeleton file, reproducing the PDX-479 regression class. The contract was enforced only by the agent's reading comprehension. Fix: Add an active runtime guard that rejects the exact shape (steps:[] + non-dry-run + output_path) with STEPS_REQUIRED before any side effects. details.suggestion tells the LLM to pass the FULL step tree in one call and notes the dry_run=true escape hatch for skeleton inspection. Other empty-steps shapes (dry-run preview, no output_path) remain allowed. Docs, smoke entry, and pdx-482-validate.cjs all updated.
RCA: PDX-482 hardened the description bodies for provar_testcase_generate and provar_testcase_step_edit, but many MCP clients (Claude Desktop tool-picker chips, Cursor audit pane, inline tool-call references in chat threads) render only the title field. The previous bare titles ("Generate Test Case", "Edit Test Case Step") gave zero PDX-479 protection to agents reading only the chip-level surface.
Fix: Updated the two tool titles to "Generate Test Case (full steps in one call)" (43 chars) and "Amend Existing Test Case Step" (29 chars). Both clear the cross-client chip-render comfort threshold (<= 50 chars). Extended MockMcpServer in the two test files to capture title alongside description; added unit assertions for the canonical phrasing and length. Extended scripts/pdx-482-validate.cjs with a titleAssertions helper run in both standard and compact schema modes (titles are mode-independent but asserting in both surfaces drift early). Updated docs/mcp.md tool sections and docs/mcp-pilot-guide.md Scenario 12 to mention the title-level contract.
…stomer docs
RCA: The initial PDX-483 commit added internal Jira IDs (PDX-479, PDX-481, PDX-482, PDX-483) and engineering-process phrasing ("regression class", "before this guard") to docs/mcp.md and docs/mcp-pilot-guide.md. These are customer-facing surfaces — pilot guides and MCP reference docs — and must not leak internal ticket plumbing or dev-process language to integrators.
Fix: Rewrite the new STEPS_REQUIRED section in docs/mcp.md and the Defense-in-depth block in docs/mcp-pilot-guide.md in terms of observable behaviour and the API contract. Pre-existing PDX-479 references in the Scenario 12 background and bug-filing line are out of scope for this PR and left as-is.
RCA: The published customer-facing MCP guides (docs/mcp.md and docs/mcp-pilot-guide.md) leaked internal Jira identifiers — "PDX-464", "PDX-479" — and a corresponding internal-only version reference ("1.5.0 (PDX-479)"). These ticket IDs are only meaningful inside the Provar engineering org and should not appear in docs that ship to pilots and customers.
Fix: Rewrite the three offending passages without losing technical meaning. (1) docs/mcp.md catalog-source note replaces "(dev build or pre-PDX-464 release)" with "(dev build or an older release that predates this metadata)". (2) docs/mcp-pilot-guide.md Scenario 12 Background drops "in 1.5.0 (PDX-479)" and describes the regression as "previously observed". (3) docs/mcp-pilot-guide.md Scenario 12 FAIL action replaces "file against PDX-479 (or its successor)" with "report it to the Provar team". No tool, schema, or behaviour changes — docs only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
…ired-guard PDX-483: add STEPS_REQUIRED runtime guard on testcase_generate
…refs-from-customer-docs PDX-0: docs(mcp) — scrub internal ticket refs from customer docs
…ol-titles RCA: PR #176 (PDX-483 runtime guard) landed on develop while PDX-484 was in flight. Both PRs touched src/mcp/tools/testCaseGenerate.ts, docs/mcp.md, docs/mcp-pilot-guide.md, scripts/pdx-482-validate.cjs, and the test file. Git auto-merged 5 of 6 files cleanly; scripts/pdx-482-validate.cjs needed manual resolution in the header comment block where both branches rewrote the file purpose. Fix: Combined the two header comments into a unified PDX-482/PDX-483/PDX-484 explanation that names all three layers (description contract, runtime guard, title contract). All other auto-merges verified by compile + 1140-passing tests + lint. The script header will be scrubbed of PDX-XXX refs and the file renamed in a follow-up chore PR per the no-tickets-in-script-paths convention.
…enforcement lint
RCA: Three scripts under scripts/ were named after the tickets that introduced them (pdx-481-trace.cjs, pdx-481-validate.cjs, pdx-482-validate.cjs). Ticket-prefixed filenames anchor the repo to internal Jira IDs, age poorly once the ticket closes, and surface in customer-visible artifacts (CI logs, PR diffs, repo browsing). The provar_testcase_generate STEPS_REQUIRED error message also referenced PDX-479 in the message body returned to the MCP client, leaking internal nomenclature into a customer-facing surface.
Fix: Renamed pdx-481-trace.cjs -> authoring-flow-trace.cjs, pdx-481-validate.cjs -> authoring-guidance-validate.cjs, pdx-482-validate.cjs -> construction-contract-validate.cjs (git mv preserves history). Updated each script's header comment, internal clientInfo identifiers, log strings, and console.log output to drop PDX-XXX references. Reworded the STEPS_REQUIRED error message in testCaseGenerate.ts to describe the contract violation in behavioural terms ("Constructing a test case requires the full step tree in a single call") instead of citing a ticket. Added scripts/lint-script-names.cjs which fails the lint chain if any file under scripts/ matches ^pdx[-_]?\d+ (case-insensitive); wired into wireit as a dependency of yarn lint. Documented the convention in CLAUDE.md.
…view RCA: Copilot review on PR #179 flagged two related gaps. (1) scripts/lint-script-names.cjs used readdirSync without recursion, so a future scripts/tmp/pdx-NNN.cjs nested file would bypass the rule despite the documentation saying "under scripts/". (2) The wireit `files` glob for `lint:script-names` was `scripts/*`, which excludes subdirectories, so wireit's cache would not invalidate on changes to nested files even if the check itself were recursive. Fix: Rewrote the offender walk as a recursive `walk(dir)` that traverses subdirectories and reports relative paths from `scripts/`. Updated the wireit `files` glob to `scripts/**/*` so cache invalidation covers nested additions. Tightened CLAUDE.md wording to call out "anywhere under scripts/ (including nested subdirectories)" and gave a nested-path example in the rejected list. Validated three states: clean (exit 0), top-level offender (exit 1), nested offender (exit 1 reporting the full nested path).
…l-titles PDX-484: carry construct-vs-amend contract into MCP tool titles
… after PR #177 landed RCA: PR #177 (PDX-484 tool-title hardening) landed on develop while this cleanup branch was open. Both branches modified scripts/pdx-482-validate.cjs — this branch renamed it to scripts/construction-contract-validate.cjs and scrubbed its PDX-XXX header references, while PR #177 added a titleAssertions() helper and rewrote the header on the old filename. Git auto-merged the title-assertion content into the renamed file (rename detection worked) but conflicted on the header comment block. Fix: Kept the clean header from this branch (no PDX-XXX refs) and merged in the new title-contract-pass paragraph reworded without the PDX-484 prefix ("Title-contract pass:" instead of "PDX-484 — title contract"). Also scrubbed the "(PDX-484)" suffixes from titleAssertions() assertion labels and the "PDX-484:" prefix from the two comment lines that call titleAssertions(toolList, record) — keeping the script free of ticket IDs to match the convention. src/mcp/tools/testCaseGenerate.ts auto-merged cleanly: the STEPS_REQUIRED error-message scrub from this branch and the title field addition from PDX-484 coexist. Verified: yarn compile clean, yarn lint clean (includes lint:script-names), 1140 unit tests pass, construction-contract-validate.cjs 40/40 PASS (12 new title-contract assertions now exercised against the merged tool).
chore(scripts): rename ticket-prefixed scripts + enforce naming convention
…ouple read-only diff from save_results RCA: Four correctness gaps surfaced in the 1.5.1 PR #172 release review that either let recommended_next_action="stop" fire while violations remained, or broke read-only diff. (B1) testPlanValidate called calcNextAction(score, false) without the remainingViolationCount arg (defaults to 0), so a plan whose test cases were structurally valid returned "stop" even when PLAN-META-* or BP violations remained. (B2) testSuiteValidate.collectAllViolations collected tc.issues but not tc.best_practices_violations, so a suite with BP-only failures returned "stop" and an empty diff. (B3) projectValidateFromPath used currentViolations (which only contains top-level project_violations) for the stop decision, so a project with nested plan/suite violations returned "stop" when the project root was clean. (B4) baseline load was gated on save_results !== false, so a valid baseline_run_id returned BASELINE_NOT_FOUND in read-only mode — save_results should control persistence of the current run only, not whether existing runs can be read. (B5) docs/PROVAR_TOOL_GUIDE.md was copied into lib/mcp/docs/ during compile but was missing from wireit.compile.files inputs, so wireit could ship a stale guide after edits. Fix: Added countAllPlanViolations and countSuiteViolations helpers in testPlanValidate.ts and pass the all-level count to calcNextAction. Added tc.best_practices_violations to collectAllViolations in testSuiteValidate.ts. Added countAllProjectViolations helper in projectValidateFromPath.ts; the diff snapshot still uses project_violations to keep response shape stable, but the stop decision now uses the all-level count. Decoupled baseline load and hasAnyRun from save_results in projectValidateFromPath.ts so read-only diff works (matches the testCaseValidate.ts pattern). Added docs/PROVAR_TOOL_GUIDE.md to wireit.compile.files. Updated stale "stop when score=100" assertions in testPlanValidate.test.ts and testSuiteValidate.test.ts that were locking in the bug, and added new B1/B2/B3/B4 coverage. Validation: yarn compile clean, full mocha 1143 passing / 0 failing, yarn lint clean, 55/55 smoke pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9 tasks
…ff scoping, test isolation RCA: Four high-priority hardening items surfaced in the 1.5.1 PR #172 release review. (H1) parseActiveGroups in server.ts already warned on empty comma-only PROVAR_MCP_TOOLS values but did not validate group names against TOOL_GROUPS keys — a typo like PROVAR_MCP_TOOLS=validaton was silently ignored and the server started with only provardx_ping registered. (H2) The DeployManual.yml publish job's "Install dependencies and build" step did not export GITHUB_TOKEN, so prepack's fetch-nitrox-packages.cjs fell back to the bundled NitroX catalog/schemas on every release; PDX-463/464's "always fetch latest from main" guarantee did not actually deliver in CI. (H3) validationDiff storage was shared across tools and contexts under ~/.provardx/validation/<sub>, with no per-context scoping — a run_id from project A could be fed to a validate call against project B and would diff against unrelated data without any indication. (H4) testSuiteValidate.test.ts wrote to the real ~/.provardx/validation/testsuite path because the unit test did not stub os.homedir, polluting the developer/CI home and exposing tests to pre-existing run state. Fix: (H1) parseActiveGroups now intersects requested groups with Object.keys(TOOL_GROUPS), warns on unknown names, and falls back to null (all groups) when nothing matches — typos are loud and never produce an empty Provar tool surface. (H2) Added GH_TOKEN: secrets.GITHUB_TOKEN env to the build step of DeployManual.yml; the auto-provided token has read access to factPackages so fetch-nitrox-packages.cjs now pulls fresh schemas on every release. (H3) validationDiff now exposes computeContextHash(toolTag, context) and records a context_hash on each run; loadBaselineViolations rejects a baseline_run_id whose context_hash does not match the calling context, preventing cross-context diffs. Added resolveValidationDir(subdir) which honors a new PROVAR_MCP_VALIDATION_DIR env override (falling back to ~/.provardx/validation/<subdir>) for restricted CI/dev environments. testCaseValidate, testSuiteValidate, and projectValidateFromPath all pass their tool tag + context to saveRun/loadBaselineViolations. (H4) Moved the testSuiteValidate.test.ts before/afterEach hooks INSIDE the describe block — mocha top-level hooks attach to the root suite and run before every test in every file, which leaked the os.homedir stub into auth/rotate.test.ts and other downstream files. Discovered while running the full mocha suite — rotate.test.ts flapped only when testSuiteValidate ran first; scoping the hooks fixes it cleanly. Tests: New startupTuning.test.ts cases assert that PROVAR_MCP_TOOLS=validaton returns null (typo footgun) and that mixed lists keep known names while dropping unknown ones. New validationDiff.test.ts cases cover computeContextHash determinism + per-tool/per-context distinctness, loadBaselineViolations rejection on context_hash mismatch (including legacy records that predate the field), loadBaselineViolations back-compat when no expectedContextHash is provided, and resolveValidationDir defaulting vs PROVAR_MCP_VALIDATION_DIR override. testSuiteValidate.test.ts hooks moved inside the describe to scope the os.homedir stub. Validation: yarn compile clean, full mocha 1155 passing / 0 failing, yarn lint clean, 55/55 smoke pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-correctness PDX-473/471: fix(mcp) — all-level violations in stop decision; read-only diff
…ening-pass-2 PDX-469: chore(mcp) — harden MCP infra: typo guard, release token, diff scoping, test isolation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.